209 research outputs found

    Adaptive Sentence Boundary Disambiguation

    Full text link
    Labeling of sentence boundaries is a necessary prerequisite for many natural language processing tasks, including part-of-speech tagging and sentence alignment. End-of-sentence punctuation marks are ambiguous; to disambiguate them most systems use brittle, special-purpose regular expression grammars and exception rules. As an alternative, we have developed an efficient, trainable algorithm that uses a lexicon with part-of-speech probabilities and a feed-forward neural network. After training for less than one minute, the method correctly labels over 98.5\% of sentence boundaries in a corpus of over 27,000 sentence-boundary marks. We show the method to be efficient and easily adaptable to different text genres, including single-case texts.Comment: This is a Latex version of the previously submitted ps file (formatted as a uuencoded gz-compressed .tar file created by csh script). The software from the work described in this paper is available by contacting [email protected]

    Do peers see more in a paper than its authors?

    Get PDF
    Recent years have shown a gradual shift in the content of biomedical publications that is freely accessible, from titles and abstracts to full text. This has enabled new forms of automatic text analysis and has given rise to some interesting questions: How informative is the abstract compared to the full-text? What important information in the full-text is not present in the abstract? What should a good summary contain that is not already in the abstract? Do authors and peers see an article differently? We answer these questions by comparing the information content of the abstract to that in citances-sentences containing citations to that article. We contrast the important points of an article as judged by its authors versus as seen by peers. Focusing on the area of molecular interactions, we perform manual and automatic analysis, and we find that the set of all citances to a target article not only covers most information (entities, functions, experimental methods, and other biological concepts) found in its abstract, but also contains 20% more concepts. We further present a detailed summary of the differences across information types, and we examine the effects other citations and time have on the content of citances

    WordSeer: A Text Analysis Environment for Literature Study

    Get PDF
    This project will continue on the success of a Digital Humanities Startup grant (HD-51244-11) to produce a software environment for literary text analysis. Literature study is a cycle of reading, interpretation, exploration, and understanding. Called WordSeer, this software system integrates tools for automated processing of text with interaction techniques that support the interpretive, exploratory, and note-taking aspects of scholarship. Development of the tool follows best practices surrounding user-centered design and evaluation. At present, the system supports grammatical search and contextual similarity determination, visualization of patterns of word context. This implementation grant will allow for incorporating additional tools to aid comparison, exploration, grouping, and hypothesis formation, and to make the software more robust and therefore sharable and usable by a wide community of scholars

    Why More Text is (Often) Better: Themes from Reader Preferences for Integration of Charts and Text

    Full text link
    Given a choice between charts with minimal text and those with copious textual annotations, participants in a study (Stokes et al.) tended to prefer the charts with more text. This paper examines the qualitative responses of the participants' preferences for various stimuli integrating charts and text, including a text-only variant. A thematic analysis of these responses resulted in three main findings. First, readers commented most frequently on the presence or lack of context; they preferred to be informed, even when it sacrificed simplicity. Second, readers discussed the story-like component of the text-only variant and made little mention of narrative in relation to the chart variants. Finally, readers showed suspicion around possible misleading elements of the chart or text. These themes support findings from previous work on annotations, captions, and alternative text. We raise further questions regarding the combination of text and visual communication.Comment: 7 pages, 3 figures, accepted to the NLVIZ workshop at IEEE Transaction on Visualization and Graphics conferenc

    Can Natural Language Processing Become Natural Language Coaching?

    Get PDF
    How we teach and learn is undergoing a revolution, due to changes in technology and connectivity. Education may be one of the best application areas for advanced NLP techniques, and NLP researchers have much to contribute to this problem, especially in the areas of learning to write, mastery learning, and peer learning. In this paper I consider what happens when we convert natural language processors into natural language coaches. 1 Why Should You Care, NLP Researcher? There is a revolution in learning underway. Stu

    Integrating Prosodic and Lexical Cues for Automatic Topic Segmentation

    Get PDF
    We present a probabilistic model that uses both prosodic and lexical cues for the automatic segmentation of speech into topically coherent units. We propose two methods for combining lexical and prosodic information using hidden Markov models and decision trees. Lexical information is obtained from a speech recognizer, and prosodic features are extracted automatically from speech waveforms. We evaluate our approach on the Broadcast News corpus, using the DARPA-TDT evaluation metrics. Results show that the prosodic model alone is competitive with word-based segmentation methods. Furthermore, we achieve a significant reduction in error by combining the prosodic and word-based knowledge sources.Comment: 27 pages, 8 figure

    Improving the Recognizability of Syntactic Relations Using Contextualized Examples

    Full text link
    A common task in qualitative data analy-sis is to characterize the usage of a linguis-tic entity by issuing queries over syntac-tic relations between words. Previous in-terfaces for searching over syntactic struc-tures require programming-style queries. User interface research suggests that it is easier to recognize a pattern than to com-pose it from scratch; therefore, interfaces for non-experts should show previews of syntactic relations. What these previews should look like is an open question that we explored with a 400-participant Me-chanical Turk experiment. We found that syntactic relations are recognized with 34 % higher accuracy when contextual ex-amples are shown than a baseline of nam-ing the relations alone. This suggests that user interfaces should display contex-tual examples of syntactic relations to help users choose between different relations.
    corecore